Preprocessing¶
In [78]:
using CSV, DataFrames, Statistics, Plots, CategoricalArrays, Dates, Random, Gurobi, JuMP
In [75]:
data = CSV.read("MLOpt_data.csv", DataFrame)
y = data."No-show"
X = select(data, [:Gender_binary, :Age_std, :Scholarship, :Hipertension, :Diabetes, :Alcoholism, :Handcap, :SMS_received, :lead_time_std, :day_of_week, :appointment_month])
X_features = select(data, [:Gender_binary, :Age_std, :Scholarship, :Hipertension, :Diabetes, :Alcoholism, :Handcap, :SMS_received, :lead_time_std, :day_of_week, :appointment_month, :Neighbourhood])
110458×12 DataFrame
110433 rows omitted
| Row | Gender_binary | Age_std | Scholarship | Hipertension | Diabetes | Alcoholism | Handcap | SMS_received | lead_time_std | day_of_week | appointment_month | Neighbourhood |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Int64 | Float64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Float64 | String15 | String7 | String31 | |
| 1 | 1 | 1.07858 | 0 | 1 | 0 | 0 | 0 | 0 | -0.667653 | Friday | April | JARDIM DA PENHA |
| 2 | 0 | 0.818894 | 0 | 0 | 0 | 0 | 0 | 0 | -0.667653 | Friday | April | JARDIM DA PENHA |
| 3 | 1 | 1.07858 | 0 | 0 | 0 | 0 | 0 | 0 | -0.667653 | Friday | April | MATA DA PRAIA |
| 4 | 1 | -1.25863 | 0 | 0 | 0 | 0 | 0 | 0 | -0.667653 | Friday | April | PONTAL DE CAMBURI |
| 5 | 1 | 0.818894 | 0 | 1 | 1 | 0 | 0 | 0 | -0.667653 | Friday | April | JARDIM DA PENHA |
| 6 | 1 | 1.68453 | 0 | 1 | 0 | 0 | 0 | 0 | -0.536562 | Friday | April | REPÚBLICA |
| 7 | 1 | -0.6094 | 0 | 0 | 0 | 0 | 0 | 0 | -0.536562 | Friday | April | GOIABEIRAS |
| 8 | 1 | 0.0831057 | 0 | 0 | 0 | 0 | 0 | 0 | -0.536562 | Friday | April | GOIABEIRAS |
| 9 | 1 | -0.695964 | 0 | 0 | 0 | 0 | 0 | 0 | -0.667653 | Friday | April | ANDORINHAS |
| 10 | 1 | -0.782527 | 0 | 0 | 0 | 0 | 0 | 0 | -0.536562 | Friday | April | CONQUISTA |
| 11 | 1 | -0.306429 | 0 | 0 | 0 | 0 | 0 | 0 | -0.536562 | Friday | April | NOVA PALESTINA |
| 12 | 0 | -0.349711 | 0 | 0 | 0 | 0 | 0 | 1 | -0.471016 | Friday | April | NOVA PALESTINA |
| 13 | 1 | -0.652682 | 1 | 0 | 0 | 0 | 0 | 0 | -0.602107 | Friday | April | NOVA PALESTINA |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 110447 | 0 | -0.176584 | 0 | 1 | 0 | 0 | 0 | 0 | -0.536562 | Wednesday | June | MARIA ORTIZ |
| 110448 | 1 | -0.00345758 | 0 | 0 | 0 | 0 | 0 | 0 | -0.602107 | Wednesday | June | MARIA ORTIZ |
| 110449 | 1 | -0.782527 | 0 | 0 | 0 | 0 | 0 | 0 | -0.667653 | Tuesday | June | MARIA ORTIZ |
| 110450 | 1 | 0.559204 | 0 | 0 | 0 | 0 | 0 | 1 | 2.01971 | Tuesday | June | MARIA ORTIZ |
| 110451 | 1 | -0.652682 | 0 | 0 | 0 | 0 | 0 | 1 | 2.01971 | Tuesday | June | MARIA ORTIZ |
| 110452 | 1 | 0.212951 | 0 | 0 | 0 | 0 | 0 | 1 | 1.62644 | Tuesday | June | MARIA ORTIZ |
| 110453 | 1 | 0.689049 | 0 | 0 | 0 | 0 | 0 | 1 | 1.62644 | Tuesday | June | MARIA ORTIZ |
| 110454 | 1 | 0.818894 | 0 | 0 | 0 | 0 | 0 | 1 | 1.62644 | Tuesday | June | MARIA ORTIZ |
| 110455 | 1 | 0.602485 | 0 | 0 | 0 | 0 | 0 | 1 | 1.62644 | Tuesday | June | MARIA ORTIZ |
| 110456 | 1 | -0.695964 | 0 | 0 | 0 | 0 | 0 | 1 | 2.01971 | Tuesday | June | MARIA ORTIZ |
| 110457 | 1 | 0.0398241 | 0 | 0 | 0 | 0 | 0 | 1 | 2.01971 | Tuesday | June | MARIA ORTIZ |
| 110458 | 1 | 0.73233 | 0 | 0 | 0 | 0 | 0 | 1 | 2.01971 | Tuesday | June | MARIA ORTIZ |
In [76]:
X.Gender_binary = CategoricalArray(X.Gender_binary)
X.Scholarship = CategoricalArray(X.Scholarship)
X.Hipertension = CategoricalArray(X.Hipertension)
X.Diabetes = CategoricalArray(X.Diabetes)
X.Alcoholism = CategoricalArray(X.Alcoholism)
X.Handcap = CategoricalArray(X.Handcap)
X.SMS_received = CategoricalArray(X.SMS_received)
X.day_of_week = CategoricalArray(X.day_of_week)
X.appointment_month = CategoricalArray(X.appointment_month)
X_features.Gender_binary = CategoricalArray(X_features.Gender_binary)
X_features.Scholarship = CategoricalArray(X_features.Scholarship)
X_features.Hipertension = CategoricalArray(X_features.Hipertension)
X_features.Diabetes = CategoricalArray(X_features.Diabetes)
X_features.Alcoholism = CategoricalArray(X_features.Alcoholism)
X_features.Handcap = CategoricalArray(X_features.Handcap)
X_features.SMS_received = CategoricalArray(X_features.SMS_received)
X_features.day_of_week = CategoricalArray(X_features.day_of_week)
X_features.appointment_month = CategoricalArray(X_features.appointment_month)
X_features.Neighbourhood = CategoricalArray(X_features.Neighbourhood)
110458-element CategoricalArray{String31,1,UInt32}:
String31("JARDIM DA PENHA")
String31("JARDIM DA PENHA")
String31("MATA DA PRAIA")
String31("PONTAL DE CAMBURI")
String31("JARDIM DA PENHA")
String31("REPÚBLICA")
String31("GOIABEIRAS")
String31("GOIABEIRAS")
String31("ANDORINHAS")
String31("CONQUISTA")
⋮
String31("MARIA ORTIZ")
String31("MARIA ORTIZ")
String31("MARIA ORTIZ")
String31("MARIA ORTIZ")
String31("MARIA ORTIZ")
String31("MARIA ORTIZ")
String31("MARIA ORTIZ")
String31("MARIA ORTIZ")
String31("MARIA ORTIZ")
In [77]:
seed = 15095
(X_trainvalid, y_trainvalid), (X_test, y_test) = IAI.split_data(:classification, X, y, seed=seed, train_proportion=0.8)
(X_train, y_train), (X_valid, y_valid) = IAI.split_data(:classification, X_trainvalid, y_trainvalid, seed=seed, train_proportion=0.8);
(Xfeat_trainvalid, yfeat_trainvalid), (Xfeat_test, yfeat_test) = IAI.split_data(:classification, X, y, seed=seed, train_proportion=0.8)
(Xfeat_train, yfeat_train), (Xfeat_valid, yfeat_valid) = IAI.split_data(:classification, Xfeat_trainvalid, yfeat_trainvalid, seed=seed, train_proportion=0.8);
Feature Selection¶
In [ ]:
grid = IAI.GridSearch(
IAI.OptimalFeatureSelectionClassifier(
random_seed=seed,
),
sparsity=1:12,
)
IAI.fit!(grid, Xfeat_trainvalid, Array(yfeat_trainvalid), validation_criterion=:auc)
All Grid Results: Row │ sparsity train_score valid_score rank_valid_score │ Int64 Float64 Float64 Int64 ─────┼────────────────────────────────────────────────────── 1 │ 1 0.0309003 0.69331 1 2 │ 2 0.0353047 0.656788 12 3 │ 3 0.038904 0.65837 10 4 │ 4 0.040399 0.659949 4 5 │ 5 0.0415989 0.658556 7 6 │ 6 0.0421186 0.658296 11 7 │ 7 0.0426327 0.658424 9 8 │ 8 0.0429795 0.65862 6 9 │ 9 0.0431705 0.658543 8 10 │ 10 0.0435749 0.659456 5 11 │ 11 0.0439266 0.661078 3 12 │ 12 0.0442522 0.661305 2 Best Params: sparsity => 1 Best Model - Fitted OptimalFeatureSelectionClassifier: Constant: -1.38275 Weights: lead_time_std: 0.384818 (Higher score indicates stronger prediction for class `1`)
In [66]:
plot(grid, type=:validation)
In [67]:
IAI.variable_importance(IAI.get_learner(grid))
98×2 DataFrame
73 rows omitted
| Row | Feature | Importance |
|---|---|---|
| Symbol | Float64 | |
| 1 | lead_time_std | 1.0 |
| 2 | Age_std | 0.0 |
| 3 | Alcoholism=1 | 0.0 |
| 4 | Diabetes=1 | 0.0 |
| 5 | Gender_binary=1 | 0.0 |
| 6 | Handcap=0 | 0.0 |
| 7 | Handcap=1 | 0.0 |
| 8 | Handcap=2 | 0.0 |
| 9 | Handcap=3 | 0.0 |
| 10 | Handcap=4 | 0.0 |
| 11 | Hipertension=1 | 0.0 |
| 12 | Neighbourhood=ANDORINHAS | 0.0 |
| 13 | Neighbourhood=ANTÔNIO HONÓRIO | 0.0 |
| ⋮ | ⋮ | ⋮ |
| 87 | Neighbourhood=VILA RUBIM | 0.0 |
| 88 | SMS_received=1 | 0.0 |
| 89 | Scholarship=1 | 0.0 |
| 90 | appointment_month=April | 0.0 |
| 91 | appointment_month=June | 0.0 |
| 92 | appointment_month=May | 0.0 |
| 93 | day_of_week=Friday | 0.0 |
| 94 | day_of_week=Monday | 0.0 |
| 95 | day_of_week=Saturday | 0.0 |
| 96 | day_of_week=Thursday | 0.0 |
| 97 | day_of_week=Tuesday | 0.0 |
| 98 | day_of_week=Wednesday | 0.0 |
Predictions¶
RF¶
In [5]:
# TODO Train a Random Forest
# Grid search for hyperparameter tuning
max_depths = [15]
num_trees = [200]
minbuckets = [100, 200]
rf_grid = IAI.GridSearch(
IAI.RandomForestClassifier(
random_seed=seed,
max_categoric_levels_before_warning=10000,
),
max_depth=max_depths,
num_trees=num_trees,
minbucket=minbuckets,
)
# Fit training data
IAI.fit_cv!(rf_grid, X_trainvalid, Array(y_trainvalid), validation_criterion=:auc, n_folds=3)
# Calculate AUC score on the validation set (name the variable holding the AUC score as `val_auc`)
val_auc = IAI.score(rf_grid, X_valid, y_valid, criterion=:auc)
┌ Warning: This copy of Interpretable AI software is for academic purposes only and not for commercial use. └ @ nothing nothing:nothing ┌ Warning: Interpretable AI license expires soon: 2025-12-31T00:00:00. If you need to renew, please send us the following machine ID: │ 1a2559dd14033a19804c4627f90f114af9c4fbd0d9ee19e3281af640f42d83a2 └ @ nothing nothing:nothing
0.7589772832042827
In [6]:
# TODO: Get best model
model = rf_grid
lnr_rf = IAI.get_learner(model)
# TODO: Display selected model's parameters
params = IAI.get_params(lnr_rf)
println(params)
println("Max depth of the best model is ", params[:max_depth])
println("Number of trees of the best model is ", params[:num_trees])
println("Minbucket size of the best model is ", params[:minbucket])
Dict{Symbol, Any}(:fast_num_support_restarts => 0, :max_categoric_levels_before_warning => 10000, :split_features => All{Tuple{}}(()), :fast_test_intermediate_support => true, :num_trees => 200, :parallel_processes => nothing, :num_threads => nothing, :max_features => :auto, :show_progress => false, :checkpoint_max_files => nothing, :max_depth => 15, :hyperplane_config => NamedTuple[], :checkpoint_interval => 10, :regression_features => Any[], :weighted_minbucket => 1, :treat_unknown_level_missing => false, :random_seed => 15095, :checkpoint_dir => nothing, :fast_cumulative_support => true, :missingdatamode => :none, :refit_learner => nothing, :checkpoint_file => nothing, :fast_num_support_iterations => 1, :normalize_X => true, :bootstrap_samples => true, :criterion => :gini, :minbucket => 100, :cp_tuning_se_tolerance => 0.0, :cp => 0)
Max depth of the best model is 15
Number of trees of the best model is 200
Minbucket size of the best model is 100
In [7]:
# TODO: Calculate AUC score for test data
auc_score_rf = IAI.score(model, X_test, y_test, criterion=:auc)
# Get class predictions for test set
acc_rf = IAI.score(model, X_test, y_test, criterion=:misclassification)
println("Test AUC score: ", round(auc_score_rf, digits=4))
println("Test Accuracy: ", round(acc_rf, digits=4))
Test AUC score: 0.7331 Test Accuracy: 0.7978
CART¶
In [8]:
max_depths = [12,15,18]
minbuckets = [100,200]
grid_cart = IAI.GridSearch(
IAI.OptimalTreeClassifier(
random_seed=seed,
localsearch = false,
),
max_depth=max_depths,
minbucket=minbuckets,
criterion=:gini
)
IAI.fit_cv!(grid_cart, X_trainvalid, Array(y_trainvalid), validation_criterion=:auc, n_folds=3)
In [9]:
# TODO: Get best model
model = grid_cart
lnr_cart = IAI.get_learner(model)
# TODO: Display selected model's parameters
params = IAI.get_params(lnr_cart)
println(params)
println("Max depth of the best model is ", params[:max_depth])
println("Minbucket size of the best model is ", params[:minbucket])
Dict{Symbol, Any}(:localsearch => false, :fast_num_support_restarts => 0, :max_categoric_levels_before_warning => 10, :split_features => All{Tuple{}}(()), :fast_test_intermediate_support => true, :ls_ignore_errors => false, :ls_num_hyper_restarts => 5, :max_depth => 12, :parallel_processes => nothing, :num_threads => nothing, :hyperplane_config => NamedTuple[], :show_progress => false, :ls_warmstart_criterion => :gini, :checkpoint_interval => 10, :ls_max_hyper_iterations => 9223372036854775807, :regression_features => Any[], :checkpoint_max_files => nothing, :weighted_minbucket => 1, :ls_bootstrap_samples => false, :treat_unknown_level_missing => false, :random_seed => 15095, :checkpoint_dir => nothing, :ls_num_greedy_features => :auto, :fast_cumulative_support => true, :missingdatamode => :none, :refit_learner => nothing, :fast_num_support_iterations => 1, :checkpoint_file => nothing, :normalize_X => true, :ls_num_categoric_restarts => 10, :criterion => :gini, :minbucket => 200, :ls_max_search_iterations => 9223372036854775807, :ls_scan_reverse_split => false, :ls_num_tree_restarts => 100, :cp_tuning_se_tolerance => 0.0, :cp => 0.0002150480321117902)
Max depth of the best model is 12
Minbucket size of the best model is 200
In [10]:
# TODO: Calculate AUC score for test data
auc_score_cart = IAI.score(model, X_test, y_test, criterion=:auc)
# Get class predictions for test set
acc_cart = IAI.score(model, X_test, y_test, criterion=:misclassification)
println("Test AUC score: ", round(auc_score_cart, digits=4))
println("Test Accuracy: ", round(acc_cart, digits=4))
Test AUC score: 0.724 Test Accuracy: 0.7981
OCT¶
In [11]:
# Train an OCT, this grid search might take some time
# Grid search for hyperparameter tuning
# TODO:
max_depths = [4,6,8]
minbuckets = [100, 200]
grid_oct = IAI.GridSearch(
IAI.OptimalTreeClassifier(
random_seed=seed,
max_categoric_levels_before_warning=10000,
),
max_depth=max_depths,
minbucket=minbuckets,
criterion=:gini,
)
IAI.fit_cv!(grid_oct, X_trainvalid, Array(y_trainvalid), validation_criterion=:auc, n_folds=3)
In [12]:
# TODO: Get best model
model = grid_oct
lnr_oct = IAI.get_learner(model)
# TODO: Display selected model's parameters
params = IAI.get_params(lnr_oct)
println(params)
println("Max depth of the best model is ", params[:max_depth])
println("Minbucket size of the best model is ", params[:minbucket])
Dict{Symbol, Any}(:localsearch => true, :fast_num_support_restarts => 0, :max_categoric_levels_before_warning => 10000, :split_features => All{Tuple{}}(()), :fast_test_intermediate_support => true, :ls_ignore_errors => false, :ls_num_hyper_restarts => 5, :max_depth => 6, :parallel_processes => nothing, :num_threads => nothing, :hyperplane_config => NamedTuple[], :show_progress => false, :ls_warmstart_criterion => :gini, :checkpoint_interval => 10, :ls_max_hyper_iterations => 9223372036854775807, :regression_features => Any[], :checkpoint_max_files => nothing, :weighted_minbucket => 1, :ls_bootstrap_samples => false, :treat_unknown_level_missing => false, :random_seed => 15095, :checkpoint_dir => nothing, :ls_num_greedy_features => :auto, :fast_cumulative_support => true, :missingdatamode => :none, :refit_learner => nothing, :fast_num_support_iterations => 1, :checkpoint_file => nothing, :normalize_X => true, :ls_num_categoric_restarts => 10, :criterion => :gini, :minbucket => 200, :ls_max_search_iterations => 9223372036854775807, :ls_scan_reverse_split => false, :ls_num_tree_restarts => 100, :cp_tuning_se_tolerance => 0.0, :cp => 1.3361958865643064e-5)
Max depth of the best model is 6
Minbucket size of the best model is 200
In [13]:
# TODO: Calculate AUC score for test data
auc_score_oct = IAI.score(model, X_test, y_test, criterion=:auc)
# Get class predictions for test set
acc_oct = IAI.score(model, X_test, y_test, criterion=:misclassification)
println("Test AUC score: ", round(auc_score_oct, digits=4))
println("Test Accuracy: ", round(acc_oct, digits=4))
Test AUC score: 0.7253 Test Accuracy: 0.7981
Prescription¶
In [ ]:
"""
prescriptive_policy_from_model(lnr, X_test)
Given a trained IAI classifier `lnr` (RF, CART, or OCT) and a test
feature matrix `X_test`, compute:
- optimal overbooking decisions z ∈ {0,1} per test point
- expected cost under baseline (z = 0)
- expected cost under the prescriptive policy
- cost savings per slot
- expected patients seen per slot (baseline and policy)
- extra patients seen per slot
Assumes labels were originally encoded as:
y = 1 (no-show), y = 0 (show)
"""
function prescriptive_policy_from_model(lnr, X_test;
c_idle::Float64 = c_idle,
c_delay::Float64 = c_delay)
# Predicted probabilities from the model
# proba[:,1] = P(y=0), proba[:,2] = P(y=1)
proba = IAI.predict_proba(lnr, X_test)
# P(no-show) = proba[:,2], P(show) = 1 - P(no-show)
p_no = proba[:, 2]
p_show = 1 .- p_no
n = length(p_show)
# Decision and metrics
z = Vector{Int}(undef, n) # 0 = no overbook, 1 = overbook
cost_baseline = Vector{Float64}(undef, n)
cost_policy = Vector{Float64}(undef, n)
seen_baseline = Vector{Float64}(undef, n)
seen_policy = Vector{Float64}(undef, n)
for i in 1:n
p = p_show[i]
# expected cost from the prescriptive ML cost c(z; y_i)
# cost0 = 150 * P(no-show) = 150 * (1 - p)
# cost1 = 75 * P(show) = 75 * p
cost0 = c_idle * (1 - p)
cost1 = c_delay * p
# pick the cheaper decision
if cost1 < cost0
z[i] = 1
cost_policy[i] = cost1
else
z[i] = 0
cost_policy[i] = cost0
end
# baseline always uses z = 0
cost_baseline[i] = cost0
# expected patients seen:
# baseline: book 1 → E[seen] = p
seen_baseline[i] = p
# policy: book b = 1 + z[i] → E[seen] = 1 - (1 - p)^b
b = 1 + z[i]
seen_policy[i] = 1 - (1 - p)^b
end
avg_cost_baseline = mean(cost_baseline)
avg_cost_policy = mean(cost_policy)
cost_savings = avg_cost_baseline - avg_cost_policy
avg_seen_baseline = mean(seen_baseline)
avg_seen_policy = mean(seen_policy)
extra_seen = avg_seen_policy - avg_seen_baseline
return (
z = z,
avg_cost_baseline = avg_cost_baseline,
avg_cost_policy = avg_cost_policy,
cost_savings = cost_savings,
avg_seen_baseline = avg_seen_baseline,
avg_seen_policy = avg_seen_policy,
extra_seen = extra_seen,
)
end
prescriptive_policy_from_model
In [54]:
res_oct = prescriptive_policy_from_model(lnr_oct, X_test)
res_cart = prescriptive_policy_from_model(lnr_cart, X_test)
res_rf = prescriptive_policy_from_model(lnr_rf, X_test)
println("=== Prescriptive Results (Expected Cost, BRL per slot) ===")
println("OCT: baseline = ", round(res_oct.avg_cost_baseline, digits=2),
", policy = ", round(res_oct.avg_cost_policy, digits=2),
", savings = ", round(res_oct.cost_savings, digits=2))
println("CART: baseline = ", round(res_cart.avg_cost_baseline, digits=2),
", policy = ", round(res_cart.avg_cost_policy, digits=2),
", savings = ", round(res_cart.cost_savings, digits=2))
println("RF: baseline = ", round(res_rf.avg_cost_baseline, digits=2),
", policy = ", round(res_rf.avg_cost_policy, digits=2),
", savings = ", round(res_rf.cost_savings, digits=2))
println("\n=== Expected Patients Seen per Slot ===")
println("OCT: baseline = ", round(res_oct.avg_seen_baseline, digits=4),
", policy = ", round(res_oct.avg_seen_policy, digits=4),
", extra = ", round(res_oct.extra_seen, digits=4))
println("CART: baseline = ", round(res_cart.avg_seen_baseline, digits=4),
", policy = ", round(res_cart.avg_seen_policy, digits=4),
", extra = ", round(res_cart.extra_seen, digits=4))
println("RF: baseline = ", round(res_rf.avg_seen_baseline, digits=4),
", policy = ", round(res_rf.avg_seen_policy, digits=4),
", extra = ", round(res_rf.extra_seen, digits=4))
=== Prescriptive Results (Expected Cost, BRL per slot) === OCT: baseline = 30.24, policy = 28.03, savings = 2.21 CART: baseline = 30.34, policy = 28.56, savings = 1.78 RF: baseline = 30.32, policy = 28.44, savings = 1.88 === Expected Patients Seen per Slot === OCT: baseline = 0.7984, policy = 0.8429, extra = 0.0445 CART: baseline = 0.7977, policy = 0.8388, extra = 0.041 RF: baseline = 0.7978, policy = 0.8367, extra = 0.0388
try 3
In [ ]:
const C_IDLE = 150.0
const C_DELAY = 75.0
# y_train: 1 = no-show, 0 = show
function leaf_costs_from_training_tree(lnr::IAI.TreeLearner,
X_train,
y_train;
c_idle::Float64 = C_IDLE,
c_delay::Float64 = C_DELAY)
node_inds = IAI.apply_nodes(lnr, X_train)
num_nodes = length(node_inds)
leaf_cost0 = zeros(Float64, num_nodes)
leaf_cost1 = zeros(Float64, num_nodes)
leaf_p_show = zeros(Float64, num_nodes)
for t in 1:num_nodes
inds = node_inds[t]
if isempty(inds) || !IAI.is_leaf(lnr, t)
continue
end
n_leaf = length(inds)
num_no = sum(y_train[inds] .== 1)
num_show = n_leaf - num_no
frac_no = num_no / n_leaf
frac_show = num_show / n_leaf
leaf_cost0[t] = c_idle * frac_no
leaf_cost1[t] = c_delay * frac_show
leaf_p_show[t] = frac_show
end
return leaf_cost0, leaf_cost1, leaf_p_show
end
leaf_costs_from_training_tree (generic function with 1 method)
In [ ]:
function prescribe_tree_with_gurobi(lnr::IAI.TreeLearner,
X_train,
y_train,
X_test;
c_idle::Float64 = C_IDLE,
c_delay::Float64 = C_DELAY)
leaf_cost0, leaf_cost1, leaf_p_show =
leaf_costs_from_training_tree(lnr, X_train, y_train;
c_idle=c_idle, c_delay=c_delay)
leaf_test = IAI.apply(lnr, X_test)
n_test = length(leaf_test)
model = Model(Gurobi.Optimizer)
set_silent(model)
@variable(model, z[1:n_test], Bin)
@objective(model, Min,
sum( leaf_cost0[leaf_test[j]] * (1 - z[j]) +
leaf_cost1[leaf_test[j]] * z[j]
for j in 1:n_test)
)
optimize!(model)
z_opt = Int.(round.(value.(z)))
cost_base = [leaf_cost0[leaf_test[j]] for j in 1:n_test]
cost_pol = [leaf_cost0[leaf_test[j]] * (1 - z_opt[j]) +
leaf_cost1[leaf_test[j]] * z_opt[j]
for j in 1:n_test]
avg_cost_base = mean(cost_base)
avg_cost_pol = mean(cost_pol)
savings = avg_cost_base - avg_cost_pol
seen_base = Float64[]
seen_pol = Float64[]
for j in 1:n_test
p = leaf_p_show[leaf_test[j]]
push!(seen_base, p)
b = 1 + z_opt[j]
push!(seen_pol, 1 - (1 - p)^b)
end
avg_seen_base = mean(seen_base)
avg_seen_pol = mean(seen_pol)
extra_seen = avg_seen_pol - avg_seen_base
return (
z_opt = z_opt,
avg_cost_base = avg_cost_base,
avg_cost_pol = avg_cost_pol,
savings = savings,
avg_seen_base = avg_seen_base,
avg_seen_pol = avg_seen_pol,
extra_seen = extra_seen,
)
end
prescribe_tree_with_gurobi (generic function with 1 method)
In [ ]:
function prescribe_rf_with_gurobi(rf_lnr,
X_test;
c_idle::Float64 = C_IDLE,
c_delay::Float64 = C_DELAY)
proba = IAI.predict_proba(rf_lnr, X_test)
p_no = proba[:, 2]
p_show = 1 .- p_no
n_test = length(p_show)
cost0 = c_idle .* (1 .- p_show)
cost1 = c_delay .* p_show
model = Model(Gurobi.Optimizer)
set_silent(model)
@variable(model, z[1:n_test], Bin)
@objective(model, Min,
sum(cost0[j] * (1 - z[j]) + cost1[j] * z[j] for j in 1:n_test)
)
optimize!(model)
z_opt = Int.(round.(value.(z)))
cost_base = cost0
cost_pol = [cost0[j] * (1 - z_opt[j]) + cost1[j] * z_opt[j]
for j in 1:n_test]
avg_cost_base = mean(cost_base)
avg_cost_pol = mean(cost_pol)
savings = avg_cost_base - avg_cost_pol
seen_base = p_show
seen_pol = [1 - (1 - p_show[j])^(1 + z_opt[j]) for j in 1:n_test]
avg_seen_base = mean(seen_base)
avg_seen_pol = mean(seen_pol)
extra_seen = avg_seen_pol - avg_seen_base
return (
z_opt = z_opt,
avg_cost_base = avg_cost_base,
avg_cost_pol = avg_cost_pol,
savings = savings,
avg_seen_base = avg_seen_base,
avg_seen_pol = avg_seen_pol,
extra_seen = extra_seen,
)
end
prescribe_rf_with_gurobi (generic function with 1 method)
In [ ]:
# Best learners from your grids
oct_lnr =lnr_oct
cart_lnr =lnr_cart
rf_lnr =lnr_rf
res_oct = prescribe_tree_with_gurobi(oct_lnr, X_train, y_train, X_test)
res_cart = prescribe_tree_with_gurobi(cart_lnr, X_train, y_train, X_test)
res_rf = prescribe_rf_with_gurobi(rf_lnr, X_test)
println("=== Expected Cost (BRL per slot) ===")
println("OCT: baseline = ", round(res_oct.avg_cost_base, digits=2),
", policy = ", round(res_oct.avg_cost_pol, digits=2),
", savings = ", round(res_oct.savings, digits=2))
println("CART: baseline = ", round(res_cart.avg_cost_base, digits=2),
", policy = ", round(res_cart.avg_cost_pol, digits=2),
", savings = ", round(res_cart.savings, digits=2))
println("RF: baseline = ", round(res_rf.avg_cost_base, digits=2),
", policy = ", round(res_rf.avg_cost_pol, digits=2),
", savings = ", round(res_rf.savings, digits=2))
println("\n=== Expected Patients Seen per Slot ===")
println("OCT: baseline = ", round(res_oct.avg_seen_base, digits=4),
", policy = ", round(res_oct.avg_seen_pol, digits=4),
", extra = ", round(res_oct.extra_seen, digits=4))
println("CART: baseline = ", round(res_cart.avg_seen_base, digits=4),
", policy = ", round(res_cart.avg_seen_pol, digits=4),
", extra = ", round(res_cart.extra_seen, digits=4))
println("RF: baseline = ", round(res_rf.avg_seen_base, digits=4),
", policy = ", round(res_rf.avg_seen_pol, digits=4),
", extra = ", round(res_rf.extra_seen, digits=4))
Set parameter Username Set parameter LicenseID to value 2710905 Academic license - for non-commercial use only - expires 2026-09-20 Set parameter Username Set parameter LicenseID to value 2710905 Academic license - for non-commercial use only - expires 2026-09-20 Set parameter Username Set parameter LicenseID to value 2710905 Academic license - for non-commercial use only - expires 2026-09-20 === Expected Cost (BRL per slot) === OCT: baseline = 30.24, policy = 28.06, savings = 2.18 CART: baseline = 30.35, policy = 28.63, savings = 1.72 RF: baseline = 30.32, policy = 28.44, savings = 1.88 === Expected Patients Seen per Slot === OCT: baseline = 0.7984, policy = 0.8428, extra = 0.0444 CART: baseline = 0.7977, policy = 0.8386, extra = 0.0409 RF: baseline = 0.7978, policy = 0.8367, extra = 0.0388 Set parameter LicenseID to value 2710905 Academic license - for non-commercial use only - expires 2026-09-20 Set parameter Username Set parameter LicenseID to value 2710905 Academic license - for non-commercial use only - expires 2026-09-20 Set parameter Username Set parameter LicenseID to value 2710905 Academic license - for non-commercial use only - expires 2026-09-20 === Expected Cost (BRL per slot) === OCT: baseline = 30.24, policy = 28.06, savings = 2.18 CART: baseline = 30.35, policy = 28.63, savings = 1.72 RF: baseline = 30.32, policy = 28.44, savings = 1.88 === Expected Patients Seen per Slot === OCT: baseline = 0.7984, policy = 0.8428, extra = 0.0444 CART: baseline = 0.7977, policy = 0.8386, extra = 0.0409 RF: baseline = 0.7978, policy = 0.8367, extra = 0.0388